Approximation Algorithms for Clustering

نویسندگان

  • ANTHONY IAN WIRTH
  • Anthony Ian Wirth
چکیده

Approximation Algorithms for Clustering Clustering items into groups is a fundamental problem in the information sciences. Many typical clustering optimization problems are NP-hard and so cannot be expected to be solved optimally in a reasonable amount of time. Although the use of heuristics is common, in this dissertation we seek approximation algorithms, whose performance ratio in relation to the optimal solution can be guaranteed and whose running time is a polynomial function of the problem instance size. We start by examining variants of the asymmetric k-center problem. We demonstrate an O(log n)-approximation algorithm for the asymmetric weighted k-center problem. Here, the vertices have weights and we are given a total budget for opening centers. In the p-neighbor variant each vertex must have p (unweighted) centers nearby: we give an O(log k)-bicriteria algorithm using 2k centers, for small p. We also show that the following three versions of the asymmetric k-center problem are inapproximable: priority k-center, k-supplier, and outliers with forbidden centers. The bulk of the dissertation concerns correlation clustering: clustering a collection of elements based on pairwise judgments of similarity and dissimilarity. The problem iniii stance does not include a distance relation between the elements. We partition the elements into clusters so that the number of pairs correctly (resp. incorrectly) classified with respect to the input judgment labeling is maximized (resp. minimized). It is worthwhile studying both complete instances, in which every pair is labeled, and general instances, in which some input pairs might not have labels. Specifically, we demonstrate a factor 4 approximation for minimization on complete instances, and a factor O(log n) approximation for general instances. For the maximization version, we give a factor 0.7664 approximation for general instances, noting that a PTAS is unlikely by proving APX-hardness. We also prove the APX-hardness of minimization on complete instances. We provide the first nontrivial approximation algorithm for maximizing the correlation: the difference between the number of pairs correctly classified and the number incorrectly classified. The factor Ω(1/ log n) algorithm is derived from an approximation algorithm for maximizing a fairly general type of quadratic program on the unit hypercube.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exact algorithms for solving a bi-level location–allocation problem considering customer preferences

The issue discussed in this paper is a bi-level problem in which two rivals compete in attracting customers and maximizing their profits which means that competitors competing for market share must compete in the centers that are going to be located in the near future. In this paper, a nonlinear model presented in the literature considering customer preferences is linearized. Customer behavior ...

متن کامل

Signal processing approaches as novel tools for the clustering of N-acetyl-β-D-glucosaminidases

Nowadays, the clustering of proteins and enzymes in particular, are one of the most popular topics in bioinformatics. Increasing number of chitinase genes from different organisms and their sequences have beenidentified. So far, various mathematical algorithms for the clustering of chitinase genes have been used butmost of them seem to be confusing and sometimes insufficient. In the...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Efficient Approximation Algorithms for Point-set Diameter in Higher Dimensions

We study the problem of computing the diameter of a  set of $n$ points in $d$-dimensional Euclidean space for a fixed dimension $d$, and propose a new $(1+varepsilon)$-approximation algorithm with $O(n+ 1/varepsilon^{d-1})$ time and $O(n)$ space, where $0 < varepsilonleqslant 1$. We also show that the proposed algorithm can be modified to a $(1+O(varepsilon))$-approximation algorithm with $O(n+...

متن کامل

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004